Learning Bimodal Structure in Audio–Visual Data

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Bimodal Structure in Audio-Visual Data

A novel model is presented to learn bimodally informative structures from audio-visual signals. The signal is represented as a sparse sum of audio-visual kernels. Each kernel is a bimodal function consisting of synchronous snippets of an audio waveform and a spatio-temporal visual basis function. To represent an audio-visual signal, the kernels can be positioned independently and arbitrarily in...

متن کامل

Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection

Voice activity detection (VAD) is an important preprocessing step in speech-based systems, especially for emerging handfree intelligent assistants. Conventional VAD systems relying on audio-only features are normally impaired by noise in the environment. An alternative approach to address this problem is audiovisual VAD (AV-VAD) systems. Modeling timing dependencies between acoustic and visual ...

متن کامل

Anatomical Structure Sketcher for Cephalograms by Bimodal Deep Learning

Lateral cephalogram X-ray (LCX) images are essential to provide patientspecific morphological information of anatomical structures. The automatic annotation of anatomical structures in cephalograms has been performed in the biomedical engineering for nearly twenty years. Most systems only handle a portion of salient craniofacial landmark set [1, 2, 3]. Although model-based methods can produce a...

متن کامل

Maximum Covariance Unfolding : Manifold Learning for Bimodal Data

We propose maximum covariance unfolding (MCU), a manifold learning algorithm for simultaneous dimensionality reduction of data from different input modalities. Given high dimensional inputs from two different but naturally aligned sources, MCU computes a common low dimensional embedding that maximizes the cross-modal (inter-source) correlations while preserving the local (intra-source) distance...

متن کامل

Learning Multi-modal Dictionaries: Application to Audiovisual Data

This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal pat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Neural Networks

سال: 2009

ISSN: 1045-9227,1941-0093

DOI: 10.1109/tnn.2009.2032182